Extracting Chinese polarity shifting patterns from massive text corpora
نویسندگان
چکیده
In sentiment analysis, polarity shifting means shifting the polarity of a sentiment clue that expresses emotion, evaluation, etc. Compared with other natural language processing (NLP) tasks, extracting polarity shifting patterns from corpora is a challenging one because the methods used to shift polarity are flexible, which often invalidates fully automatic approaches. In this study, which aimed to extract polarity shifting patterns that inverted, attenuated, or canceled polarity, we used a semi-automatic approach based on sequence mining. This approach greatly reduced the cost of human annotating, while covering as many frequent polarity shifting patterns as possible. We tested this approach on different domain corpora and in different settings. Three types of experiments were performed and the experimental results were analyzed, which will be reported in this paper.
منابع مشابه
Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
Polarity lexicons are a basic resource for analyzing the sentiments and opinions expressed in texts in an automated way. This paper explores three methods to construct polarity lexicons: translating existing lexicons from other languages, extracting polarity lexicons from corpora, and annotating sentiments Lexical Knowledge Bases. Each of these methods require a different degree of human effort...
متن کاملAn Alignment Based Technique for Text Translation between Traditional Chinese and Simplified Chinese
Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we describe an alignment technique for extracting transfer mapping from the parallel corpus. During building our system and data collection, we observe that there are three types of translation approaches can be used....
متن کاملاستخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملEvaluative Pattern Extraction for Automated Text Generation
Getting travel tips from the experienced bloggers and online forums has been one of the important supplements to the travel guidebook in the web society. In this paper we present a novel approach by identifying and extracting evaluative patterns, providing a different linguistically-motivated framework for automated evaluative text generation. We target at domain-specific observation in online ...
متن کاملA Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words ...
متن کامل